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SELECTING  PROCEDURES  FOR  OPTIMAL  SUBSET 
OF  REGRESS  ION  VARIABLES* 


by 

Shanti  S.  Gupta,  Purdue  University 
and 

Deng-Yuan  Huang,  National  Taiwan  Normal  University 


Recently,  a  number  of  methods  have  been  developed  for  selecting  the 
"best"  or  at  least  a  "good"  subset  of  variables  in  regression  analysis. 

For  various  reasons,  we  may  be  interested  in  including  only  a  subset,  sav, 
of  site  r  <  p,  the  number  of  independent  variables.  Various  authors  have 
considered  this  problem  and  a  variety  of  techniques  are  presently  being 
used  to  construct  such  subsets. 

Arvesen  and  McCabe  (1975)  proposed  a  procedure  for  selecting  a  subset 
within  a  class  of  subsets  with  t  (fixed)  independent  variables,  taking  into 
account  the  statistical  variation  of  the  residual  mean  squares.  Huang  and 
Panchnpakesan  (1982)  proposed  a  selection  procedure  based  on  the  expected 
residual  sums  of  squares.  Hsu  and  Huang  (1982)  studied  a  sequential  selection 
procedure  for  good  regression  models. 

In  this  paper,  wc  are  interested  in  deriving  an  optimal  decision  procedure 
based  on  residual  mean  squares  to  select  a  subset  excluding  all  "inferior" 
independent  variables.  This  kind  of  optimality  criterion  is  related  to  the 
approach  of  Gupta  and  Huang  (1977). 

Let  7T().I1| . denote  k+1  normal  populations  with  unknown  variances 

2 

. . . ,o"  respectively.  Assume  that  is  known.  A  population  (model)  is 

i  2 

said  to  be  superior  (or  good)  if  o7  <  A  Oq,  to  be  inferior  (or  bad)  if 

*This  research  was  supported  by  a  grant  from  the  National  Science  Council  of 
Republic  of  China.  It  is  also  supported  by  the  Office  of  Naval  Research 
Contract  N00014-75-C-0455  at  Purdue  University. 


2  2 

>_  A  0g,  where  A  is  a  specified  constant  greater  than  1.  Let  Si  be  the 
parameter  space  which  is  the  collection  of  all  possible  parameters. 

Let  Cl)  stand  for  a  correct  decision  which  is  defined  to  be  the 
selection  of  any  subset  which  excludes  all  the  inferior  populations. 
Assuming  the  following  model 


(1)  Y  =  X6  +  € 

where  X  =  [1 ,X^ , . . . ,X  _ ^ ]  is  an  nxp  known  matrix  of  rank  p  <  n. 


and 


8'  =  , . . .  ,6  j)  is  a  lxp  parameter  vector,  and  £  o,  N(0,  o^l^),  : 

1'  =  11,. ...11,  1  is  an  identity  matrix  with  nxn. 

1  1 1 xN  n  1 

In  what  follows,  (1)  which  has  p-1  independent  variables,  will  be  viewed 

2 

as  the  true  model.  Without  loss  of  generality  we  can  assume  that  o  =  1. 
Consider  the  models  for  any  r,  2  <_  r  <_  p-1. 


(2) 


Y  =  X  .  6  .  +  €  . 
ri  -ri  -ri 


where  X  .  is  an  nxr  matrix  of  rank  r  with  X'  =  11,...,  II,  ,  8  .  is  a  rxl 
ri  11  1  1 1  xn  -ri 

’  ii  i 

parameter  vector,  and  €  .  c  N(0,<>“.  I  ),  i  -  1,2 . k  ('  Let 

1  -  r  i  r  i  n  r  r  -  I 

Pv1 

k  =  i  k  .  It  should  be  noted  that  in  stating  the  reduced  model  (2),  our 
r=2  r 

comparisons  of  models  are  made  under  the  true  model  assumptions.  The  goal 


is  to  include  all  the  designs  X  ^  (or  sets  of  independent  variables)  associated 

2  2  2  2 

with  or.,,  j  =  l,...,k-t,  where  o.,.  <  or„.  <...<  a,,  ^ ,  are  ordered  values 
[j  ]  [1]  -  [2]  -  -  [k-t] 

from  some  of  o^'s,  i  =  l,...,k  ,  r  =  2,. ...p-1. 


Note  that  for  any  r. 


P-1. 


SS 


.  =  Y ’ { 1 - X  . (X' .X  . ) “  X ' .  } Y  =  Y'Q.Y, 
ri  -  rr  ri  n  r l  -  -  - 
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SSr.  '•  X2(vr,  (X|j)'Qri  (Xf.)/21 


(under  the  true  model),  where  =  n-r,  for  1  <  i  <  k^.  Note  that  the 
noncentrality  parameter,  in  general,  is  not  zero,  and  that 

a2ri  =  U(X6)’Qri(X6)/vr. 


Now  we  need  some  notation.  Deleting  a  set  of  B^'s  without  specifying  which 

ones  are  deleted,  we  use  ri  to  denote  the  special  subset  that  is  not  deleted. 

For  example,  if  p  =  3,  r  =  2  then  there  are  three  subsets  with  size  2;  namely, 

{ B , ,  £32 ) »  {8,,S3}  and  { B2  >  } .  Then  rl  denotes  the  set  (Sj^L  r2  denotes 

{B0,B,}  and  r3  denotes  { B ^ , B 3 } .  Then,  we  use  3  to  denote  the  vector  with  the 

following  subsets:  with  (Bj.^.O).  {Bj.Bj}  with  (3^,0, Bj),  and 

{3^,3^}  with  (0,37,33),  where  0  is  the  parameter  value  which  is  omitted  from 

the  true  model  of  the  appropriate  3^'s.  Thus,  in  the  following,  we  will 

use  to  denote  those  sets  of  3  as  described  above  with  the  further 

0,  n 

2 

condition  that  a  .  =  o„  =  1.  Similarly,  £1,  will  be  used  to  denote  the 
ri  0  '  1 ,  n 

2 

sets  of  8  as  described  above  with  the  further  restriction  that  o  ■  >  A. 

r  l  — 

Formally,  we  write 


and 


Vri  '  '  "• 

Vri  ' 


where  i  =  l,...,k  ;  r  =  2,...,p-I,  and  let 


1 


U  U  a  and 

r=2  i=l  1,r 
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P-1 

n0  =  n 

r=2  i=l 


k 

r 

n  n 


0,ri ' 


Let  g  ^  (s^)  denote  the  probability  density  of  .  depending  on  the 


a  . 
n 


SS 


2  r  l 

parameter  ori,  where  Sri  =  — — ,  i  =  l,...,kr;  r  =  2,...,p-l. 


Consider  a  family  of  hypotheses  testing  problems  as  follows: 


(3) 


H0,ri:  vs  Kri:  §6IW 


i  =  l,...,p-l,  r  =  2,...,p-l.  A  test  of  the  hypotheses  (3)  will  be  defined 
to  be  a  vector  (cp^(y) , . . .  ,cp^(y)) ,  where  the  elements  of  the  vector  are 
ordinary  test  functions;  when  y  is  observed  we  reject  H()  with  probability 
cpt(y) ,  1  <  t  <  k.  The  power  function  of  a  test  (;pj  , .  .  .  ,-pjJ  is  defined  to  be 
the  vector  (p^ (B) , . . . ,p^ (6) )  where 

pt(§)  =  Eg  -?t(Y). 

1  <_  t  <_  k.  Let  S(y)  be  the  set  of  all  tests  (cpj  , .  .  .  ,;pk)  such  that 

(4)  Egcpt(Y)<Y,  §  6  nQ. 


We  define  cp®  =  (cp^ , . . .  ,cp^)  as 


0  r 

=Pri(y) 


1#  if  g.(s  .)  >  c  g.(s  .), 
6A  ri  —  61 1  n 


°.  if  gA(sri)  <  c 


such  that  Es  cp  .  (Y)  =  y,  fj  €  where  s  .  is  the  observed  value  of  S  . 

6  Yri  -  -  0  ri  ri 

It  can  be  shown  that  cp^  maximizes 


min  inf  E~  sp  (V) 
l<t<k  06S2,  " 

among  all  tests  <p  =  (cp^ , . . . ,9^)  €  S(y)  (cf.  Gupta  and  Huang  (1977)). 

To  determine  the  constant  c,  we  proceed  follows:  for  a  given  n 
there  exists  a  smallest  positive  integer  such  that 


where 


Ji  a,  ,  ai 

k0  ko+1  k0 

—  <  1  and  — —  +  —  <  1, 
n  \  v  -  • 

K0 


r.l  _,1  . 

e  A  v  s  .  ,  F  t  v 
„  ,  .  r  ,  r  ri.it  K2  r ' 

at  Sri^  {,  1  t  2  1 

1  /  rtjvH) 


(A-l)vr 

i  =  0,1,2,...;  ^ - •  F°r  this  k(),  it  can  be  shown  that 


g  (s  . )  k0  1  00 

hAri  r  ,  .  V 

2  rr  T  l  Vs  .)  =  I  ak+kin, 
8llSriJ  J 1=0  rl  k=0  V 


gA(sri)  _  y 
8l  ^Sri^  1=0  1 


1'hus,  approximately. 


gA(Sri) 

8l(sri} 


k-1 

l  MSrP 


with  error  less  than  n.  I'or  3  €  S2q, 


ES»  '?>  ■  P5<MSri> 


V1 

'  'Vio  at(s’-i’  ,-cl 


■  /  >  k 

0  o  1 

f  I  a{,(sri)  1  c3 

1=0  *■  rl 


(s  . )g, (s  . )ds  .  =  y> 

1  ri"6l  ri 1  ri 


where  e, (s  .)  is  the  central  x  with  v  decrees  of  freedom  and  1, 
61  ri  r  *  A 

for  x  4  A,  I  (x)  =  1  for  x  €  A.  The  constant  c  can  he  determined 


(x) 
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